68 research outputs found
Robust Accelerating Control for Consistent Node Dynamics in a Platoon of CAVs
Driving as a platoon has potential to significantly benefit traffic capacity and safety. To generate more identical dynamics of nodes for a platoon of automated connected vehicles (CAVs), this chapter presents a robust acceleration controller using a multiple model control structure. The large uncertainties of node dynamics are divided into small ones using multiple uncertain models, and accordingly multiple robust controllers are designed. According to the errors between current node and multiple models, a scheduling logic is proposed, which automatically selects the most appropriate candidate controller into loop. Even under relatively large plant uncertainties, this method can offer consistent and approximately linear dynamics, which simplifies the synthesis of upper level platoon controller. This method is validated by comparative simulations with a sliding model controller and a fixed Hâ controller
Feasible Policy Iteration
Safe reinforcement learning (RL) aims to solve an optimal control problem
under safety constraints. Existing safe RL methods use the
original constraint throughout the learning process. They either lack
theoretical guarantees of the policy during iteration or suffer from
infeasibility problems. To address this issue, we propose an
safe RL method called feasible policy iteration (FPI) that
iteratively uses the feasible region of the last policy to constrain the
current policy. The feasible region is represented by a feasibility function
called constraint decay function (CDF). The core of FPI is a region-wise policy
update rule called feasible policy improvement, which maximizes the return
under the constraint of the CDF inside the feasible region and minimizes the
CDF outside the feasible region. This update rule is always feasible and
ensures that the feasible region monotonically expands and the state-value
function monotonically increases inside the feasible region. Using the feasible
Bellman equation, we prove that FPI converges to the maximum feasible region
and the optimal state-value function. Experiments on classic control tasks and
Safety Gym show that our algorithms achieve lower constraint violations and
comparable or higher performance than the baselines
Effect of PulseâandâGlide Strategy on Traffic Flow for a Platoon of Mixed Automated and Manually Driven Vehicles
The fuel consumption of ground vehicles is significantly affected by how they are driven. The fuelâoptimized vehicular automation technique can improve fuel economy for the host vehicle, but their effectiveness on a platoon of vehicles is still unknown. This article studies the performance of a wellâknown fuelâoptimized vehicle automation strategy, i.e., PulseâandâGlide (PnG) operation, on traffic smoothness and fuel economy in a mixed traffic flow. The mixed traffic flow is assumed to be a singleâlane highway on flat road consisting of both driverless and manually driven vehicles. The driverless vehicles are equipped with fuel economyâoriented automated controller using the PnG strategy. The manually driven vehicles are simulated using the Intelligent Driver Models (IDM) to mimic the average carâfollowing behavior of human drivers in naturalistic traffics. A series of simulations are conducted with three scenarios, i.e., a single car, a car section, and a car platoon. The simulation results show that the PnG strategy can significantly improve the fuel economy of individual vehicles. For traffic flows, the fuel economy and traffic smoothness vary significantly under the PnG strategy.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/115907/1/mice12168.pd
Stability and scalability of homogeneous vehicular platoon: study on the influence of information flow topologies
In addition to decentralized controllers, the information flow among vehicles can significantly affect the dynamics of a platoon. This paper studies the influence of information flow topology on the internal stability and scalability of homogeneous vehicular platoons moving in a rigid formation. A linearized vehicle longitudinal dynamic model is derived using the exact feedback linearization technique, which accommodates the inertial delay of powertrain dynamics. Directed graphs are adopted to describe different types of allowable information flow interconnecting vehicles, including both radar-based sensors and vehicle-to-vehicle (V2V) communications. Under linear feedback controllers, a unified internal stability theorem is proved by using the algebraic graph theory and Routh-Hurwitz stability criterion. The theorem explicitly establishes the stabilizing thresholds of linear controller gains for platoons, under a large class of different information flow topologies. Using matrix eigenvalue analysis, the scalability is investigated for platoons under two typical information flow topologies, i.e., 1) the stability margin of platoon decays to zero as 0(1/N2) for bidirectional topology; and 2) the stability margin is always bounded and independent of the platoon size for bidirectional-leader topology. Numerical simulations are used to illustrate the results
Parallel Optimal Control for Cooperative Automation of Large-scale Connected Vehicles via ADMM
This paper proposes a parallel optimization algorithm for cooperative
automation of large-scale connected vehicles. The task of cooperative
automation is formulated as a centralized optimization problem taking the whole
decision space of all vehicles into account. Considering the uncertainty of the
environment, the problem is solved in a receding horizon fashion. Then, we
employ the alternating direction method of multipliers (ADMM) to solve the
centralized optimization in a parallel way, which scales more favorably to
large-scale instances. Also, Taylor series is used to linearize nonconvex
constraints caused by coupling collision avoidance constraints among
interactive vehicles. Simulations with two typical traffic scenes for multiple
vehicles demonstrate the effectiveness and efficiency of our method
Safe Reinforcement Learning with Dual Robustness
Reinforcement learning (RL) agents are vulnerable to adversarial
disturbances, which can deteriorate task performance or compromise safety
specifications. Existing methods either address safety requirements under the
assumption of no adversary (e.g., safe RL) or only focus on robustness against
performance adversaries (e.g., robust RL). Learning one policy that is both
safe and robust remains a challenging open problem. The difficulty is how to
tackle two intertwined aspects in the worst cases: feasibility and optimality.
Optimality is only valid inside a feasible region, while identification of
maximal feasible region must rely on learning the optimal policy. To address
this issue, we propose a systematic framework to unify safe RL and robust RL,
including problem formulation, iteration scheme, convergence analysis and
practical algorithm design. This unification is built upon constrained
two-player zero-sum Markov games. A dual policy iteration scheme is proposed,
which simultaneously optimizes a task policy and a safety policy. The
convergence of this iteration scheme is proved. Furthermore, we design a deep
RL algorithm for practical implementation, called dually robust actor-critic
(DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC
achieves high performance and persistent safety under all scenarios (no
adversary, safety adversary, performance adversary), outperforming all
baselines significantly
Recommended from our members
A Survey on Cooperative Longitudinal Motion Control of Multiple Connected and Automated Vehicles
- âŠ